47 research outputs found

    Meet Charles, big data query advisor

    Get PDF
    In scientific data management and business analytics, the most informative queries are a holy grail. Data collection becomes increasingly simpler, yet data exploration gets significantly harder. Exploratory querying is likely to return an empty or an overwhelming result set. On the other hand, data mining algorithms require extensive preparation, ample time and do not scale well. In this paper, we address this challenge at its core, i.e., how to query the query space associated with a given database. The space considered is formed by conjunctive predicates. To express them, we introduce the Segmentation Description Language (SDL). The user provides a query. Charles, our query advisory system, breaks its extent into meaningful segments and returns the subsequent SDL descriptions. This provides insight into the set described and offers the user directions for further exploration. We introduce a novel algorithm to generate SDL answers. We evaluate them using four orthogonal criteria: homogeneity, simplicity, breadth, and entropy. A prototype implementation has been constructed and the landscape of follow-up research is sketched

    Fast Cartography for Data Explorers

    Get PDF

    X-Device Query Processing by Bitwise Distribution

    Get PDF
    The diversity of hardware components within a single system calls for strategies for efficient cross-device data processing. For exam- ple, existing approaches to CPU/GPU co-processing distribute individual relational operators to the “most appropriate” device. While pleasantly simple, this strategy has a number of problems: it may leave the “inappropriate” devices idle while overloading the “appropriate” device and putting a high pressure on the PCI bus. To address these issues we distribute data among the devices by par- tially decomposing relations at the granularity of individual bits. Each of the resulting bit-partitions is stored and processed on one of the available devices. Using this strategy, we implemented a processor for spatial range queries that makes efficient use of all available devices. The performance gains achieved indicate that bitwise distribution makes a good cross-device processing strategy

    Blaeu: Mapping and navigating large tables with cluster analysis

    Get PDF
    Blaeu is an interactive database exploration tool. Its aim is to guide casual users through large data tables, ultimately triggering insights and serendipity. To do so, it relies on a double cluster analysis mechanism. It clusters the data vertically: it detects themes, groups of mutually dependent columns that highlight one aspect of the data. Then it clusters the data horizontally. For each theme, it produces a data map, an interactive visualization of the clusters in the table. The data maps summarize the data. They provide a visual synopsis of the clusters, as well as facilities to inspect their content and annotate them. But they also let the users navigate further. Our explorers can change the active set of columns or drill down into the clusters to refine their selection. Our prototype is fully operational, ready to deliver insights from complex databases

    Scalable Generation of Synthetic GPS Traces with Real-life Data Characteristics

    Get PDF
    Database benchmarking is most valuable if real-life data and workloads are available. However, real-life data (and workloads) are often not publicly available due to IPR constraints or privacy concerns. And even if available, they are often limited regarding scalability and variability of data characteristics. On the oth

    Combining design and performance in a data visualization management system

    Get PDF
    Interactive data visualizations have emerged as a prominent way to bring data exploration and analysis capabilities to both technical and non-technical users. Despite their ubiquity and importance across applications, multiple design- and performance-related challenges lurk beneath the visualization creation process. To meet these challenges, application designers either use visualization systems (e.g., Endeca, Tableau, and Splunk) that are tailored to domain-specific analyses, or manually design, implement, and optimize their own solutions. Unfortunately, both approaches typically slow down the creation process. In this paper, we describe the status of our progress towards an end-to-end relational approach in our data visualization management system (DVMS). We introduce DeVIL, a SQL-like language to express static as well as interactive visualizations as database views that combine user inpu

    Have a chat with Clustine, conversational engine to query large tables

    No full text
    Thanks the recent advances of AI and the stellar popularity of messaging apps (e.g., WhatsApp), chatbots are no longer bound to customer support services and computer museums. Indeed, they provide a mighty, lightweight and accessible way to provide services over the Internet. In this paper, we introduce Clustine, a chatbot to help users query large tables through short messages. The main idea is to combine cluster analysis and text generation to compress query results, describe them with natural language and make recommendations. We present the architecture of our system, demonstrate it with two use cases, and present early validation experiments with 12 real datasets to show that its promises are reachable

    Combining design and performance in a data visualization management system

    No full text
    Interactive data visualizations have emerged as a prominent way to bring data exploration and analysis capabilities to both technical and non-technical users. Despite their ubiquity and importance across applications, multiple design- and performance-related challenges lurk beneath the visualization creation process. To meet these challenges, application designers either use visualization systems (e.g., Endeca, Tableau, and Splunk) that are tailored to domain-specific analyses, or manually design, implement, and optimize their own solutions. Unfortunately, both approaches typically slow down the creation process. In this paper, we describe the status of our progress towards an end-to-end relational approach in our data visualization management system (DVMS). We introduce DeVIL, a SQL-like language to express static as well as interactive visualizations as database views that combine user inputs modeled as event streams and database relations, and we show that DeVIL can express a range of interaction techniques across several taxonomies of interactions. We then describe how this relational lens enables a number of new functionalities and system design directions and highlight several of these directions. These include (a) the use of provenance queries to express and optimize interactions, (b) the application of concurrency control ideas to interactions, (c) a streaming framework to improve near-interactive visualizations, and (d) techniques to synthesize interactive interfaces tailored to end-users
    corecore